AITopics | core point

Collaborating Authors

core point

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

31421b112e5f7faf4fc577b74e45dab2-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 18:45:00 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Scalable DBSCAN with Random Projections

Neural Information Processing SystemsOct-9-2025, 22:35:43 GMT

Theoretically, sDBSCAN preserves the DBSCAN's clustering structure under mild

core point, minpt, sdbscan, (15 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Oversampling and Downsampling with Core-Boundary Awareness: A Data Quality-Driven Approach

Belhaouari, Samir Brahim, Kahalan, Yunis Carreon, Shaffique, Humaira, Belhaouari, Ismael, Islam, Ashhadul

arXiv.org Artificial IntelligenceSep-25-2025

The effectiveness of machine learning models, particularly in unbalanced classification tasks, is often hindered by the failure to differentiate between critical instances near the decision boundary and redundant samples concentrated in the core of the data distribution. In this paper, we propose a method to systematically identify and differentiate between these two types of data. Through extensive experiments on multiple benchmark datasets, we show that the boundary data oversampling method improves the F1 score by up to 10\% on 96\% of the datasets, whereas our core-aware reduction method compresses datasets up to 90\% while preserving their accuracy, making it 10 times more powerful than the original dataset. Beyond imbalanced classification, our method has broader implications for efficient model training, particularly in computationally expensive domains such as Large Language Model (LLM) training. By prioritizing high-quality, decision-relevant data, our approach can be extended to text, multimodal, and self-supervised learning scenarios, offering a pathway to faster convergence, improved generalization, and significant computational savings. This work paves the way for future research in data-efficient learning, where intelligent sampling replaces brute-force expansion, driving the next generation of AI advancements. Our code is available as a Python package at https://pypi.org/project/adaptive-resampling/ .

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.19856

Country: Europe (1.00)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Unimodal Strategies in Density-Based Clustering

Nir, Oron, Tenenbaum, Jay, Shamir, Ariel

arXiv.org Artificial IntelligenceJun-30-2025

Density-based clustering methods often surpass centroid-based counterparts, when addressing data with noise or arbitrary data distributions common in real-world problems. In this study, we reveal a key property intrinsic to density-based clustering methods regarding the relation between the number of clusters and the neighborhood radius of core points - we empirically show that it is nearly unimodal, and support this claim theoretically in a specific setting. We leverage this property to devise new strategies for finding appropriate values for the radius more efficiently based on the Ternary Search algorithm. This is especially important for large scale data that is high-dimensional, where parameter tuning is computationally intensive. We validate our methodology through extensive applications across a range of high-dimensional, large-scale NLP, Audio, and Computer Vision tasks, demonstrating its practical effectiveness and robustness. This work not only offers a significant advancement in parameter control for density-based clustering but also broadens the understanding regarding the relations between their guiding parameters. Our code is available at https://github.com/oronnir/UnimodalStrategies.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.21695

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Israel (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(11 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Dynamic DBSCAN with Euler Tour Sequences

Shin, Seiyun, Shomorony, Ilan, Macgregor, Peter

arXiv.org Artificial IntelligenceMar-11-2025

We propose a fast and dynamic algorithm for Density-Based Spatial Clustering of Applications with Noise (DBSCAN) that efficiently supports online updates. Traditional DBSCAN algorithms, designed for batch processing, become computationally expensive when applied to dynamic datasets, particularly in large-scale applications where data continuously evolves. To address this challenge, our algorithm leverages the Euler Tour Trees data structure, enabling dynamic clustering updates without the need to reprocess the entire dataset. This approach preserves a near-optimal accuracy in density estimation, as achieved by the state-of-the-art static DBSCAN method (Esfandiari et al., 2021) Our method achieves an improved time complexity of $O(d \log^3(n) + \log^4(n))$ for every data point insertion and deletion, where $n$ and $d$ denote the total number of updates and the data dimension, respectively. Empirical studies also demonstrate significant speedups over conventional DBSCANs in real-time clustering of dynamic datasets, while maintaining comparable or superior clustering quality.

algorithm, core point, dist, (15 more...)

arXiv.org Artificial Intelligence

2503.08246

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > Thailand (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Rethinking Deep Clustering Paradigms: Self-Supervision Is All You Need

Shaheena, Amal, Mrabahb, Nairouz, Ksantinia, Riadh, Alqaddoumia, Abdulla

arXiv.org Artificial IntelligenceMar-5-2025

The recent advances in deep clustering have been made possible by significant progress in self-supervised and pseudo-supervised learning. However, the trade-off between self-supervision and pseudo-supervision can give rise to three primary issues. The joint training causes Feature Randomness and Feature Drift, whereas the independent training causes Feature Randomness and Feature Twist. In essence, using pseudo-labels generates random and unreliable features. The combination of pseudo-supervision and self-supervision drifts the reliable clustering-oriented features. Moreover, moving from self-supervision to pseudo-supervision can twist the curved latent manifolds. This paper addresses the limitations of existing deep clustering paradigms concerning Feature Randomness, Feature Drift, and Feature Twist. We propose a new paradigm with a new strategy that replaces pseudo-supervision with a second round of self-supervision training. The new strategy makes the transition between instance-level self-supervision and neighborhood-level self-supervision smoother and less abrupt. Moreover, it prevents the drifting effect that is caused by the strong competition between instance-level self-supervision and clustering-level pseudo-supervision. Moreover, the absence of the pseudo-supervision prevents the risk of generating random features. With this novel approach, our paper introduces a Rethinking of the Deep Clustering Paradigms, denoted by R-DC. Our model is specifically designed to address three primary challenges encountered in Deep Clustering: Feature Randomness, Feature Drift, and Feature Twist. Experimental results conducted on six datasets have shown that the two-level self-supervision training yields substantial improvements.

core point, neighbor, paradigm, (16 more...)

arXiv.org Artificial Intelligence

2503.03733

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Asia > Middle East > Bahrain (0.04)

Genre: Research Report > Promising Solution (0.87)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DeepCore: Simple Fingerprint Construction for Differentiating Homologous and Piracy Models

Sun, Haifeng, Zhang, Lan, Li, Xiang-Yang

arXiv.org Artificial IntelligenceNov-1-2024

As intellectual property rights, the copyright protection of deep models is becoming increasingly important. Existing work has made many attempts at model watermarking and fingerprinting, but they have ignored homologous models trained with similar structures or training datasets. We highlight challenges in efficiently querying black-box piracy models to protect model copyrights without misidentifying homologous models. To address these challenges, we propose a novel method called DeepCore, which discovers that the classification confidence of the model is positively correlated with the distance of the predicted sample from the model decision boundary and piracy models behave more similarly at high-confidence classified sample points. Then DeepCore constructs core points far away from the decision boundary by optimizing the predicted confidence of a few sample points and leverages behavioral discrepancies between piracy and homologous models to identify piracy models. Finally, we design different model identification methods, including two similarity-based methods and a clustering-based method to identify piracy models using models' predictions of core points. Extensive experiments show the effectiveness of DeepCore in identifying various piracy models, achieving lower missed and false identification rates, and outperforming state-of-the-art methods.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.0038

Country:

North America > United States > California > Alameda County > Oakland (0.04)
Europe > Greece (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > Promising Solution (0.86)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Bi-Filtration and Stability of TDA Mapper for Point Cloud Data

Bungula, Wako, Darcy, Isabel

arXiv.org Machine LearningSep-25-2024

Carlsson, Singh and Memoli's TDA mapper takes a point cloud dataset and outputs a graph that depends on several parameter choices. Dey, Memoli, and Wang developed Multiscale Mapper for abstract topological spaces so that parameter choices can be analyzed via persistent homology. However, when applied to actual data, one does not always obtain filtrations of mapper graphs. DBSCAN, one of the most common clustering algorithms used in the TDA mapper software, has two parameters, \textbf{$\epsilon$} and \textbf{MinPts}. If \textbf{MinPts = 1} then DBSCAN is equivalent to single linkage clustering with cutting height \textbf{$\epsilon$}. We show that if DBSCAN clustering is used with \textbf{MinPts $>$ 2}, a filtration of mapper graphs may not exist except in the absence of free-border points; but such filtrations exist if DBSCAN clustering is used with \textbf{MinPts = 1} or \textbf{2} as the cover size increases, \textbf{$\epsilon$} increases, and/or \textbf{MinPts} decreases. However, the 1-dimensional filtration is unstable. If one adds noise to a data set so that each data point has been perturbed by a distance at most \textbf{$\delta$}, the persistent homology of the mapper graph of the perturbed data set can be significantly different from that of the original data set. We show that we can obtain stability by increasing both the cover size and \textbf{$\epsilon$} at the same time. In particular, we show that the bi-filtrations of the homology groups with respect to cover size and $\epsilon$ between these two datasets are \textbf{2$\delta$}-interleaved.

cluster cover, filtration, minp ts, (13 more...)

arXiv.org Machine Learning

2409.1736

Country:

North America > United States > Wisconsin > La Crosse County > La Crosse (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Iowa (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

What's the Opposite of a Face? Finding Shared Decodable Concepts and their Negations in the Brain

Efird, Cory, Murphy, Alex, Zylberberg, Joel, Fyshe, Alona

arXiv.org Artificial IntelligenceMay-27-2024

Prior work has offered evidence for functional localization in the brain; different anatomical regions preferentially activate for certain types of visual input. For example, the fusiform face area preferentially activates for visual stimuli that include a face. However, the spectrum of visual semantics is extensive, and only a few semantically-tuned patches of cortex have so far been identified in the human brain. Using a multimodal (natural language and image) neural network architecture (CLIP) we train a highly accurate contrastive model that maps brain responses during naturalistic image viewing to CLIP embeddings. We then use a novel adaptation of the DBSCAN clustering algorithm to cluster the parameters of these participant-specific contrastive models. This reveals what we call Shared Decodable Concepts (SDCs): clusters in CLIP space that are decodable from common sets of voxels across multiple participants. Examining the images most and least associated with each SDC cluster gives us additional insight into the semantic properties of each SDC. We note SDCs for previously reported visual features (e.g. orientation tuning in early visual cortex) as well as visual semantic concepts such as faces, places and bodies. In cases where our method finds multiple clusters for a visuo-semantic concept, the least associated images allow us to dissociate between confounding factors. For example, we discovered two clusters of food images, one driven by color, the other by shape. We also uncover previously unreported areas such as regions of extrastriate body area (EBA) tuned for legs/hands and sensitivity to numerosity in right intraparietal sulcus, and more. Thus, our contrastive-learning methodology better characterizes new and existing visuo-semantic representations in the brain by leveraging multimodal neural network representations and a novel adaptation of clustering algorithms.

negative image, participant, representation, (14 more...)

arXiv.org Artificial Intelligence

2405.17663

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

Block-Diagonal Guided DBSCAN Clustering

Zhao, Weibing

arXiv.org Artificial IntelligenceApr-26-2024

Cluster analysis plays a crucial role in database mining, and one of the most widely used algorithms in this field is DBSCAN. However, DBSCAN has several limitations, such as difficulty in handling high-dimensional large-scale data, sensitivity to input parameters, and lack of robustness in producing clustering results. This paper introduces an improved version of DBSCAN that leverages the block-diagonal property of the similarity graph to guide the clustering procedure of DBSCAN. The key idea is to construct a graph that measures the similarity between high-dimensional large-scale data points and has the potential to be transformed into a block-diagonal form through an unknown permutation, followed by a cluster-ordering procedure to generate the desired permutation. The clustering structure can be easily determined by identifying the diagonal blocks in the permuted graph. We propose a gradient descent-based method to solve the proposed problem. Additionally, we develop a DBSCAN-based points traversal algorithm that identifies clusters with high densities in the graph and generates an augmented ordering of clusters. The block-diagonal structure of the graph is then achieved through permutation based on the traversal order, providing a flexible foundation for both automatic and interactive cluster analysis. We introduce a split-and-refine algorithm to automatically search for all diagonal blocks in the permuted graph with theoretically optimal guarantees under specific cases. We extensively evaluate our proposed approach on twelve challenging real-world benchmark clustering datasets and demonstrate its superior performance compared to the state-of-the-art clustering method on every dataset.

algorithm, graph, representation, (16 more...)

arXiv.org Artificial Intelligence

2404.01341

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > Dominican Republic > Samaná > Samaná (0.04)
Asia (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback